Notably absent in previous research on inductive expert systems is the study of mean-risk trade-offs. Such trade-offs may be significant when there are asymmetries such as unequal classification costs, and uncertainties in classification and information acquisition costs. The objective of this research is to develop models to evaluate mean-risk trade-offs in value-based inductive approaches. We develop a combined mean-risk measure and incorporate it into the Risk-Based induction algorithm. The mean-risk measure has desirable theoretical properties (consistency and separability) and is supported by empirical results on decision making under risk. Simulation results using the Risk-Based algorithm demonstrate: (i) an order of magnitude performance difference between mean-based and risk-based algorithms and (ii) an increase in the performance difference between these algorithms as either risk aversion, uncertainty, or asymmetry increases given modest thresholds of the other two factors.
Retrieval of a set of cases similar to a new case is a problem common to a number of machine learning approaches such as nearest neighbor algorithms, conceptual clustering, and case based reasoning. A limitation of most case retrieval algorithms is their lack of attention to information acquisition costs. When information acquisition costs are considered, cost reduction is hampered by the practice of separating concept formation and retrieval strategy formation. To demonstrate the above claim, we examine two approaches. The first approach separates concept formation and retrieval strategy formation. To form a retrieval strategy in this approach, we develop the CR<sub>1c</sub> (case retrieval loss criterion) algorithm that selects attributes in ascending order of expected loss. The second approach jointly optimizes concept formation and retrieval strategy formation using a cost based variant of the ID3 algorithm (ID3<sub>c</sub>). ID3<sub>c</sub> builds a decision free wherein attributes are selected using entropy reduction per unit information acquisition cost. Experiments with four data sets are described in which algorithm, attribute cost coefficient of variation, and matching threshold are factors. The experimental results demonstrate that (i) jointly optimizing concept formation and retrieval strategy formation has substantial benefits, and (ii) using cost considerations can significantly reduce information acquisition costs, even if concept formation and retrieval strategy formation are separated.
Inductive expert systems typically operate with imperfect or noisy input attributes. We study design differences in inductive expert systems arising from implicit versus explicit handling of input noise. Most previous approaches use an implicit approach wherein inductive expert systems are constructed using input data of quality comparable to problems the system will be called upon to solve. We develop an explicit algorithm (ID3<sub>ecp</sub>) that uses a clean (without input errors) training set and an explicit measure of the input noise level and compare it to a traditional implicit algorithm, ID3<sub>p</sub> (the ID3 algorithm with the pessimistic pruning procedure). The novel feature of the explicit algorithm is that it injects noise in a controlled rather than random manner in order to reduce the performance variance due to noise. We show analytically that the implicit algorithm has the same expected partitioning behavior as the explicit algorithm. In contrast, however, the partitioning behavior of the explicit algorithm is shown to be more stable (i.e., lower variance) than the implicit algorithm. To extend the analysis to the predictive performance of the algorithms, a set of simulation experiments is described in which the average performance and coefficient of variation of performance of both algorithms are studied on real and artificial data sets. The experimental results confirm the analytical results and demonstrate substantial differences in stability of performance between the algorithms especially as the noise level increases.
The data dictionary system is a documentation source that is useful for management reviews of existing and proposed systems, EDP audits, and system development functions. Early data dictionary systems had limitations that reduced their effectiveness and contributed to their limited usage. Many of these limitations have been or are being resolved with the result that evolving data dictionary systems offer many benefits to management and EDP auditors. This article evaluates the features, potential benefits, and limitations of data dictionary systems from the perspective of the EDP auditor.